Environmental robust features for speech detection

نویسندگان

  • Thomas Kemp
  • Climent Nadeu
  • Yin Hay Lam
  • Josep Maria Sola i Caros
چکیده

In this paper, two novel features, Line Spectrum Center Range and Line Spectrum Flux, both derived from Line Spectrum Frequencies, are proposed to detect the presence of speech in various acoustic environments. Evaluation results using Fischer Discriminant Analysis and Scatter Matrices indicated that the new features excel the state-of-theart features. An environmental robust hybrid feature set including the proposed features, Normalized Energy Dynamic Range and Mel-Frequency Cepstrum Coefficients is further introduced. When evaluating the hybrid feature set on a Gaussian Mixture Model based classification engine, the results showed that the hybrid feature set outperformed MelFrequency Cepstrum Coefficients up to in terms of relative frame error rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Speech Modulation Features for Robust Nonnative Speech Accent Detection

In this paper, we propose to use speech modulation features for robust nonnative accent detection. Modulation spectrum carries long term temporal information of speech and may discriminate accents of native and nonnative speakers. For each speech segment to be tested, we extract a 10 dimension feature vector from modulation spectrum and use it for model training and testing. The proposed modula...

متن کامل

Speech activity detection on youtube using deep neural networks

Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral...

متن کامل

Missing features detection and handling for robust speaker verification

This paper addresses the problem of robust textindependent speaker verification in the presence of missing (masked by noise) features. It presents and assesses several missing feature handling approaches. In these approaches, the speech enhancement and missing feature detection are based on the minimum mean-square error (MMSE) spectral amplitude estimator of Ephraim and Malah [1].

متن کامل

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004